Modeling vowel duration for Japanese text-to-speech synthesis
نویسندگان
چکیده
Accurate estimation of segmental durations is crucial for naturalsounding text-to-speech (TTS) synthesis. This paper presents a model of vowel duration used in the Bell Labs Japanese TTS system. We describe the constraints on vowel devoicing, and effects of factors such as phone identity, surrounding phone identities, accentuation, syllabic structure, and phrasal position on the duration of both long and short vowels. A Sum-of-Products approach is used to model key interactions observed in the data, and to predict values of factor combinations not found in the speech database. We report root mean squared deviations between observed and predicted durations ranging from 8 to 15 ms, and an overall correlation of 0.89.
منابع مشابه
Modeling segmental durations for Japanese text-to-speech synthesis
Accurate estimation of segmental durations is crucial for naturalsounding text-to-speech (TTS) synthesis. This paper presents a model of segmental duration used in the Bell Labs Japanese TTS system. We describe the constraints on vowel devoicing, and effects of factors such as phone identity, surrounding phone identities, accentuation, syllabic structure, and phrasal position on the duration of...
متن کاملA study on automatic detection of Japanese vowel devoicing for speech synthesis
In corpus-based speech synthesis, the quality of the synthetic speech critically depends on the speech corpus. Since the high vowel in Japanese might be devoiced in the real speech, we should detect and transcribe them automatically in the corpus construction. In this paper, we apply the HMM-based method, and adopt two kinds of likelihood differences as voicing measures for different focuses. T...
متن کاملSyllable-based acoustic modeling for Japanese spontaneous speech recognition
We study on a syllable-based acoustic modeling method for Japanese spontaneous speech recognition. Traditionally, mora-based acoustic models have been adopted for Japanese read speech recognition systems. In this paper, syllable-based unit and mora-based unit are clearly distinguished in their definition, and syllables are shown to be more suitable as an acoustic model for Japanese spontaneous ...
متن کاملA Japanese text-to-speech system based on multi-form units with consideration of frequency distribution in Japanese
This paper proposes our new text-to-speech (TTS) system that concatenates large numbers of speech segments to produce very natural and intelligible synthetic speech. One novel point of our system is its new synthesis unit, which is has three remarkable characteristics as follows; The synthesis units contain all Japanese syllables together with all possible vowel sequences, so very smooth synthe...
متن کاملLearning Phonemic Vowel Length from Naturalistic Recordings of Japanese Infant-Directed Speech
In Japanese, vowel duration can distinguish the meaning of words. In order for infants to learn this phonemic contrast using simple distributional analyses, there should be reliable differences in the duration of short and long vowels, and the frequency distribution of vowels must make these differences salient enough in the input. In this study, we evaluate these requirements of phonemic learn...
متن کامل